Language Variation as a Context for Information Retrieval

نویسندگان

  • Ahmed Abdelali
  • Jim Cowie
  • Hamdy S. Soliman
چکیده

Speakers of widespread languages may encounter problems in information retrieval and document understanding when they access documents in the same language from another country. The work described here focuses on the development of resources to support improved document retrieval and understanding by users of Modern Standard Arabic (MSA). The lexicon of an Egyptian Arabic speaker and the lexicon of an Algerian Arabic speaker overlap, but there are many lexical tokens which are not shared, or which mean different things to the two speakers. These differences give us a context for information retrieval which can improve retrieval performance and also enhance document understanding after retrieval. The availability of a suitable corpus is a key for much objective research. In this paper we present the results of experiments in building a corpus for Modern Standard Arabic (MSA) using data available on the World Wide Web. We selected samples of online published newspapers from different Arabic countries. We demonstrate the completeness and the representativeness of this corpus using standard metrics and show its suitability for Language engineering experiments. The results of the experiments show that is possible to link an Arabic document to a specific region based on information induced from its vocabu-

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Context-based Information seeking behavior among students of Kharazmi University

Background and Aim: The present study has been done in order to survey contextualized information retrieval behavior by the students of Kharazmi University. Methods: This is descriptive applied research. Statistical population includes all the students currently studying at the Kharazmi University in the time of research. Sample of research includes 196 students selected by convenience sampling...

متن کامل

Improved Skips for Faster Postings List Intersection

Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...

متن کامل

Improved Skips for Faster Postings List Intersection

Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...

متن کامل

Context-Aware Recommender Systems: A Review of the Structure Research

 Recommender systems are a branch of retrieval systems and information matching, which through identifying the interests and requires of the user, help the users achieve the desired information or service through a massive selection of choices. In recent years, the recommender systems apply describing information in the terms of the user, such as location, time, and task, in order to produce re...

متن کامل

بررسی تأثیرات ریشه‌یابی در بازیابی اطلاعات در زبان فارسی

Using the language-specific behavior in information retrieval systems can improve the quality of the retrieved results significantly. Part of the word that remains after removing its affixes is called stem. Stemming process can be used for improving the relevancy of the results in information retrieval system. Different morphological variants of words (plural, past tense…) will be mapped into t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005